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LEARNING ,\S A PHOBLi£>! 50LVINT, TOOL 

lillv r.. Clavbrook 
Cor.puter Scipnco Department 
Virt^inia Polytechnic Institute & State University 
Blacksburi;, Virginia 240ol 

T^.i': Dan^r pxplor^?? the use of learning as a practical tool in problem 
r.olvinr. ll-ie id^a. that loarninr, should and eventually will be a vital component 
of r^or,t Artificial Into 1 1 if^ence pronrans is pursuel. 

Current techniques in learning systems are compared. A detailed discussion 
of the problems of representinp;, tnodifvlnR, and creating heuristics Is p,iven. 
3or?e o! the cucstions asked (and answered) in the paper arei (1) how does the 
choice of repreroatation affect r-he potential for learninr.?, (2) what techniques 
havp be^n u'l-^-i to rt.itc^ and how do they conpare?» i.e. first-order predicate 
calculus vs. production rules vs. Winston's representation, and (3) exactly how 
are h*»uristics nodifi?d in the existing svstems and what do these techniques 
have in cormon? A discussion of the credit asslf!?iment problem as It relates 
to learninr under the various schemes of representation is also presented. 
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This paper is concerned with encouraging the use of learning as a 
prr>blt:^m solving tool in heuristic programs. We try to accomplish this: 
(P bv pointing ^^ut whar has heon accompl ished^ ( 2) by discussing what 
tht* m.ijor prviblems .ire, ;md (3) by showing how the problems can be 
ipprvViched , 

Tht- utilitv y^i hturistic programs depends to a large ^»xtent on the 
idtM-jmcr the heuristics omplovtd. We discuss three cv>': ont techniques 
r rt'present ins heuristics chat have been employed in succ<,;sful, non- 
:ri-;i! pr/>.:r jm environments: (I) Waterman's production rules in playing 
Jrr.v o^ikerf^l^, (2) CLiybrook's first*order predicate calculus in factoring 
iT^ul t L ; jr L iLe oolvnomials (21* and (3) Winston's representation in scene 

^ . ' :\\:'T ANT) ;^RF:SK\'T ACCOMPirSr{>tEXTS IN LEARNING 

■xce;?!: i rew leirning programs, most learning programs to date, 

v,^. Michie ind R^ss' GT Slagle and FarrelTs MULTIPLEClSj, etc., have 

bt en i?3s jcl ited with simple problem domains. Notable exceptions are SamuePs 
chr'CK-r nr ^^nm|^l7]J , W.iterman's poker programplj, Claybrook's multivariate 
pilvnomiil t ic tor iza t ion program, and Winston's scene analysis program(|2^. 
Wo teol thit further advances in learning techniques could have been developed 
in r^. r-'ulilv hid Leirning been implemented in more complex domains, Wc are not 
trvirv^ ti) reduce the importance of early research eff'^^ts, for they have provided 
ns V 1 : r \ v« 1 1 r h r) f i n forma t i on . 

M; sr I 'lrnmi; programs have used 53 ime form of general i;^ed learning pf^J 
vif^'is r ^t- loa -nin^O'O* The earlv learning programs 07^,083 implemented 
^en«;r}iL^e(i leirning by optimizing weights associated with problem variables 
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line.ir oviluati.>a functions. Samuel's program n.^t only uses rote loarnln^* 
ina gi*ner.i I L Us! !o ?rning, but U also uses book learntnfi as part of a training 
et:.>rt. Another type of Iteming that has recoivod considerable use is 
c^nce.^r learning; C^l : ^.-M^ , f 2 2^ • Towstor['203 provides soversil methods for 
pr^^iramnini^ c^uicepc - 1 v^rmat ion, Winston's program tor K^irning structual 
dt scr i pt lonci from ciiildrt^n's toys can rccofiniz^' concepts and learn concepts to 

ri'C^\iJ:n i.:t*vi . 

W-.* h.r.-v p.'J: nri/J t^^ n^ake an extensive survey ot" Ie.irninR programs for they 
ir^ .V' 11 Jt scrlbt d t lsi. wht.rc C"} » £1*^1 . . <>ne tact that prevails throughout 
■\:r-: srudv le imiHi; t r torts is that few of these learning techniques have 
'^u't n t y^I.>.f d m 3 pr ictical problems for which algorithmic solutions do 

n t oysr irt^ verv Cv^stlv. Sammet ^6] discusses several important problem 
• rt is in c ^r.puror scit^^nce where heuristic techniques could and should be employed 
! is inrt^rt-^ t ing ttiat Nilsson in his book04^ wanted to include a chapter 
';r I "^--^ ^ ! - i r.tthods usin): rrt.ich i no - learr ing techniques, but he 
c [':irA t^.^^ :ht.- subject was not yet well onough dev<*Ioped to be included in 

j.o ^ LAJOR PROBLKMS WITH LEARMING 

: ^'lis - i:>t r rncourages thf ust^ .^f learning in heuristic [urograms 
n. ^ -^Iv sr. 'uld we describe some of the major problems associated with learning, 
Htit ^rpr- the rtM^ions why people have failed to use learning. 

V'w rt' several reasons whv learning has not been utilized by problem 

s 1 /ers : 

; .. ^ n is known ^b lut learning hv the human problem solver to 



2. Thf implemt^ntJt ion o( ItMrning schtitnes, oven smplo ones, appears 
to he 1 tormidjhlo task; and In some instances it is. 

3. There is ilwavs tht- possibiltcv that tho program will not learn or 
Impr^vr with oxporioncf, and tfius not improve tUv efticioncv of 
^Dt'ratl.^n />: tlu* progjram. 

Iht' numan nr^hKn solver may vanr to solvo his problem as quickly 
IS p.vssihio is not interested in learning or its benefits. 

^^^ f"'* i J ci rt iin mount or overhead associated with learninR, 
i.r, r.l:o :^r.^etSi; ot analv:^in^; solution attempts and modifying 
'uur:s:ics requires computer time. 
An. r purp »sv >L this paper is to help remove some of the above problems. 

i '^.^ rMi.>r pr ^hlems associated directly with the actual use and implementation 
■■: [r ;rai:i^ :fchni;ue.s include: 

I. ir'Ci: n u \ powerful representation of heuristics (requirements of 
' r- ■■^r,- : \ti 'H iff ^iven in fc r. i.^n 

'-!ut:^r: :)f the c red i t -ass ignment problem for modification and creation 
■ ■'I heuristics. 

3. :^evel 'pini; _i training sequence appropriate for starting the learning 

"'^"NTTii-. I !: ion of features of th..- problem environment on which to key 
ttu* le'ornink: process. 
'> . ; < i ec r L m ^ * r" t he t ype o f h?ci rn in^ sc heme to be used • 

ibrMi^^n the learning effort (this is closely associated with the 
L' r*d i t - ignment prob lem) . 

?r ■'y'\rr,-^u listed ^hove need a c logc^r look, Tht^ selection of a 
"'■ir-s-ry' i r r,. hiuristir^ and s-lution )f rh^. cred i t-ass ii^nmt^nt problem are 

j.^cus-^efi d.'t ul in S-cti^n 4.0. The triinin^ raquence for a learnin^2: 



program must be selected caretully for it influences the program's ability to 
learn and the rate of learning. Winston C22l argues the importance of a good 
training sequence and says that the sequence should be prepared by good teachers. 
His rrmework tor learning suggests a unity between learning from examples, 
learning bv imitativ>n^ and learning by .g told. The training sequence can 
include samples tor each type of learning* Winston also stresses the importance 
u the nerr miss. A neat miss is a sample in a training sequence quite like the 
c.^nc^-^pt t.^ be learned but which differs from that concept in only a few 
sieniticant points at most. The near misses convey essential points much more 
direcrlv that repetitive exposure to ordinary examples. Concept learning 
schemes nornaUv require a careful selection of samples for a training sequence. 

Clavhrook provides several examples from his multivariate polynomial 
t ict,>ri:Mt i on program, POLYFACT, that illustrate how the training sequence of 
o^lynonials c m attect the factorization time of subsequently factored polynomials 
\::h r^t r ict.ri-itioa times in POUTACT are influenced by the training 

s-»quoncr p Ivnomials, the selection of samples is not as critical as it can be 
in >ther pr^^blem spaces and other learning programs* The reader should realize 
that the re! itive importance of a good training sequence is determined not only 
hv the learning scheme used, but also by the representation of the heuristics 
ind che c h ir ic r er is t ics of the problem environment, 

fiuman beings tend to learn to solve a particular class of problems by 
kt;vin*: '^n si*;ntficint features of the problem class^ The features are used to 
dettrmine the anproach taken in solving the problem. Unfortunately, there is no 
aat->matic vjv of extracting the significant features associated with a particular 
pr 'nitrn. Thr^ i;ame nf chess is a problem area where the features to key on for 
m.ikm^ 1 neict mn'fp. arc extremely import mt. The human must almost always select 
the features ase in the learning pr^>cess- The learning program can usually 
indiC'it*^ which features are important for learning and "bad" features can 
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be removed trotn consideration. Sometimes the devt^lopers ot" learning programs 
believe that a largo number of features will fc*ive bettor learning results; 
however, experience has shown Cl^t D^} that this is usually a misconception. 

Pr.^iH^r SK'lvctv^n .>t J tew "good*' features will normally lead to go.>d UMtning 
results. For example, only four features are used for term selection in 
pOiVp^CT :in^i onl s. ven or eij^hr teature.^ irc used for possibilitv selection. 
Waturaan's sratt- suhv^xtor for draw poker lias only seven elements. 

>' h*ctiv^n .>t the type ot learning, e.g. rote learning, generalized learning, 
or concept K-arning, is determined by characteristics of the problem environment. 
i<^Cr l/;\^rnin^ has btcn used bv Samuel with great success. His checker program 
store's records or bvjard positions and when a move is to be made the previous 
board positions ire interrogated to determine the proper move to make. 
An ob.:ous disadvantage of rote learning is that large amounts of memory may 
bt' rtquirrd t.> st.>ro the history. This is especially true if a large number 

.itLis.'Hs Iff -^^ssinU*. As in exampK- c^^nsidor the number of possible board 
p-^si:i >ns m chess >r checkers. Another problem with rote learning is that time 
must bv spent in retrieving and matching previous records with the current 
situation, Sinuel has a sophiscated scheme for retrieving board positions. 

A c ^ncept Is i classification rule. Concept learning schemes also use 
roatur^s classi:; objects. Concept learning schemes use past experience to 
clasairv .in irrst mce /)f an object as either positive or negative. This type of 
learn irv^ can a 1 5^ rf qulrf excessive amounts of memory to store the past history. 
•:onc-pt ]f»arnin^ r -chniques do not alwnvs satisfy the requirements for learning 
in a particular problem area. For example, Claybrook used concept learning 
to tr/ r-^ (i^'t-rninf ihf hf-st poss ih i 1 i t it- s (terms) tn select during the creation 

i :acr^r in :i r>^ I vn''>rnia K The results > f this effort clearlv dem<mst rated 
'The shnrtc ^nin/.s nr cr^ncept learning in this particular situation. The reason 
for t^'is .i ^/^ar '\vv)cr' mH "had" pns<; i f. • 1 i t: ies have iTianv fiintvircR in cornmon. 
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and it IS JUticult lLltsiiv' 'ii* ww^ p«sitivt' .nui Jiv Mur .is 

negative. Wha: reqiiir^-d jtvi t/ .v::t ii.i : ly used w.is .i ^t^iura I i/.t^d liMrnin^i 

rechniqiu^ t'^.tt was iblf 1 i :;^cr ir: i i t v ihv p.^ss ib H i t i o3 .uni rank ihem 

^U^bt Itviriuiiv ;>r^srams uSt' iitMu r.i:. ^: Icarniriv; schiinos (Cv>!iCt pt loarnin^ 

"rt vi.'ius rxprri-tv r ...iLhuit rT>aint liniii.; ;;;wividual roc^^rds previous solutiim 
]:t-nr>ts. Airvr-an Kui ;'lavbr-.^k j.-\^^r \ l . l^ariun^ in Lh^iir programs, 

:^u: :r ni^ .1 ; r icar i.^r;:: ^ivur.scics i:v. 'i.v mucli mort' tfian modiiviiv^ w< i^hts 
I.:'.'.-\r I i VI 1 1 1 /n : u:ic t. i .'»arf , 

-an. I.Mrn;:u. ;^r.>.>ra:-s C^3,[;"^'3 -^^t' lu^uristic St-arch programs, the 
rr^.t-rb ds : r t valuaLln^ U-.\r:\in^ art' ass.c. *r.'<l with trt^e .^r ^raph measurmionrs, 
:-^Ms-r;:^- the bMshint/Ss i prm- L r ai;c . ) ai tlit- soarch tree the path 
r; : r ^--^ : br stirr n.^i^' t^t^ - i! - .i , Th.^s^ an- firviss mc-asurork-nt s 

;.^ir. ,)r>/ri:: s ic ' ^:.V;A(:;, Liu' I^-arniin', in tacli artM 

' ' : r '/:^, r ctars. sb -aM =ic evaluated individual Iv vit h 

r* s:.* c^ r ^ tnt sensr >i direct i mi i^uat .ruidos impr.avL-mcnc through learning. 



.♦am I th^u 'S v;st'd t(> di/tcrmine modifications to 



^ ' ' ' ' :^r*^hh'ns .^t rnoditving and rr^- itin^ 
!:v';r.>'iL:^ a:s.ru^ : :rr- r r prcs^ :U i ' I - • a i l t nnan ' s pr.uiuclim rul.s, Clavhri»i»k' 
^.ir^^- rhr -r.d:c?-^ 1 c u !u s . i::d .;...s* rrpr^'S' n t .i t i . Firs? wt- di5;cus^ 

'* ' ' ■ir- ■• r -r.' > n^'^* : - - : ' - ^ .r-nt i il it I'^ar-WPr. Tli^-n i a' 

*-arh ; 'id i . i Jim ; prr^t n t at i ■ -a v^' 1.; -r r ' • ( } > tho rrprv:;onL a f i on t f 'chn i rjutv , 
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pr^l^l"^ IS tt^I irv s r > N iMu::. 'i ^'- : L-vpreSt^-iu .it ion. Fiiuil ly, wi* 



. ♦ 1 .; 



o\ tilt ihrtt^ U'U'nin^ prok^r.uns 



* I' ■ ' ' ■ :'t .tiU r t>i iMC'li author's dissortati 



on 



■ r • ■ > 



^> " i:ris: :rs is prv^bjblv t !u» kt*v to Che success 
' ♦ t::is Mr--^ . -.r 'rr J ; c^ns idi-r;?t i.^ns (or n^quiromont s) 



'"^ ' ;-.r-\si.*ni i complex actU)ns in 

■ ■ ' ■« * . ■•>-cuti-wT :>l ln urist ics sh<uilcj be- 

■ ■ ' :>^r' . r r -^f-nt u i^^n scht'nu* is that it conservt.* 

-r-^' t r i n -^^'i^i - ^ 1 t. It.ist -i p;^rti;il solution of t!if> 

■v'!ir;s'ic^ sli-ii; v.' r, i.*-, th*- rt-proscnt/it i on slinuld 

-V c^n'^rnicri ^-Mrl-^^ics from d ist in^uishnh le Cf>niponents . 

^' ■ ^ t ; s', r-. : . '.^iristics 1)*' r ! tTt^ncod /is 

' .:..^.5 r' r.t , s<,-ts - r 1' lu-ur i ;> t i c s , 

" I ,^n[c m mipulr^r ir»n rhirfn^ pr^>;:r;iTn 

■ ' r.-TiS ;<1»T I* • ■: — ^^';<i^ilitv "1" flic 'nrfScn Mon , i.o 
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the a Ml lev t.A LnttTch^inge thr conponcnts that comprise the he-uristics 
in Che evonc char heuristics are changed by the designer. 

The reader may uMnc to keep those considerations in minH uhUe residing the rest 

'I this Section. 

S.^me cht> requirements require j brief explanation. The first consideration 
is mv.tivated hv the r.'.»U-acion that m.inv the actions performed in complex 
learning programs such as POt.YFACT require a comprehensive analysis of the problem 
Aiiu.it I. y-A, i.e. SLVi-ral criCtria must he considered, often simultaneously, to 
assur- the il! prior oml it ions are sat is f ied before performing an notion. Also, the 
tir-^t c 'ns ivJt r K i sukk^-his that we ha --t .1 general representation that can be 
us.'d m 1 If irninii; svstt-n for solving v.'.rious problems. 

r-u Sfcad cnsiJ. ration does not necessarily imply that the decisions 
r- !*r.'.! t> tht- actual creation and modification of heuristics be simple; however, 
'nc. thes,. Jrcisi 'ns an reached for particular heuristics, the procedures should 
• r-c i : .n :\iijr-. mi rtlitivt-lv .s iir.p ! e to txecute. The third and fifth 

c ^ns :i»T at i ^n-; ir^- c. mp lomentary . If tht- ht-uristics arc modular, thev can be 

:>r- 3t nt. d in an encoded farm to conserve storage. This is especially important 
in nr'^t;rans that use classification mechanisms for implementing localized learning. 
Hv I xMli -pd !. arnm.:, we mean that each classification has a set of learned 
h.-urtsr-.cs r'^r s living that particular class of problem. 

^ ■luti 01 thf credit-assignment problem is included because of its 
imp -rfanc" f learning mechanisms. Representation of heuristics must enable the 
iss'Unink; cr-dit f .r success nr failure among the many heuristics of potential 
use in solving a particular problem. 

:)vn imic -Tiani r-nl It i .n, e.g. creating and modifying hptjristics during program 
.•■/'■c'it i i in rhs .Itif -linimum rrqu i rt-mf nt for a represent a t i fin . The execution 
nr.c-s'? pr-r termed mh the heuristics must for dynamic changes in hr-ur i st ics . 
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F.^r this reasv)n tht -execution procedures should execute the heuristics by using 
An interpretive process (LISP 1.5 and SNOBOL IV processors are interpreters 
^nd .ire therefore convenient languages for writing learning programs), 

^ • - Waterman^ S iYoduct ion*Rule Reoresent^i; [pn 

\\Ur*r!nan*3 n^ain interest is in devising machine- learning techniques that can 
he applied c /> tht! pr.^hlem of learning heuristics. He sees the problem of 
imr It'inent iag the mach ine- learning of h<juristics as two subproblems: 

1. Htvise J method of representing heuristics that facilitates dynamic 
-^ani f'ulat K>n by the program using them, 

■'t: -!v>p tochniques by which a program can create, evaluate, and 
m^ni i its own heuristics. 
■A itertr.an implemented his learning scheme, using a production rule 
rt'-r-s-nt It I tnr h^'uristics, in a draw poker playing program. He chose draw 
^ <^ r ^r'.: liiSr^ Lc is a aoncrivial game in which nlayers do not have access to enough oi 
riu ^ -cisting ^ame information to perform effective minimaxlag. 

^,2A Representation of Heuristics 

Vtrt'rman siys that a good representation should: 

( I ) Of rmit Separation of the heuristics from the main body of the program, 

(2) provide identification of individual heuristics and an indication of 
how rhey aro interrelated, and 

(3) be compcitible with generalized schemes* 

Two definiti)n3 ire required for the follov/ing discussion: 

HfMiriqtic Rule, A heuristic that directly specifies an action to be taken, 
rfvuristic Definition. A heuristic that defines a term. 
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As a program is executed, it goes^hrough a succession ot states as the 
value of its program variables are changed. A state vector is used to 
indicate the current values of the program variable^** When a block of code is 
executed, the effect on the state vector ^ can be described by ^' = f(oO, where 
i,^ the resulting state vector and f represents a block of code. A heuristic 
is rt;presented :is a rule of the form S-^ T where S is the current state vector 
md T is the vector containing the mapped components. The rule can be thought 
oi as 1 specification of how a state vector can lead to other state vectors. 
An cx.imple^ is 

oi = (A, O— ^(g^ (o<), g2 (ot), g3 (oc)). 
Ihc lunction g, changes the value of A to gj (o^;, B becomes g2 (^)» and C becomes 
(^^^ The items A, B, and C represent: sets of values and not individual values. 
Thus, 1 s incite state vector such as (A, B, C) represents a number of states instead 
>r a single state, thereby reducing the total number of heuristics required, 

A file >t the type S^I,, where is a situation defined by vector 

variables and is the definition of the resulting situation. Production rules 
this form are called acti on rules. A heuristic definition can be represented 
by a production rule of the type Z Z' , where Z is a value of a state vector 
/triable and Z' is either: 

1, 1 value of a state vector variable and an associated predicate 

(called a backward form rule), or 
1. a computational rule for con4)ining variables of the state vector 
Tea I led a forward form rule). 
An example of a backward form rule is Dl — ^ D, D>20, meaning that D is considered 
a nember ) t the set Dl if the current value of D is greater than 20. An exomple 
>? I i^rvird f-jrm rule is Kl * A, meaning that X is defined by the 

arithmetic expression Kl * A. 

The examples In this section are taken from Waterman l^2Q* 



Ihe state vect.^r has three types of variables: (I) bookkeeping variables, 
-/r ide I record past experiences; (2) function variables, whicf' 
rrsr'p.c ;ir itrTinot ic express ii-)ns containg state vector variablesj and (3) ■ 
1/ inic var ;ar>lt*s, winch either directly influence the decisions ot the pr.»^r.irr, 
■r ci.a-.-r m value as a direct result of these decisions. Only dynamic variables 
.r 'J : ::e descriptions which represent the left parts of action rules, 
i: ; rl' d.ca-uc and function variables are used in the right parts of these rult^s 
! -kk- mr.^' Mriables are used in the definitions of the forward f.^rr. ruKs. 
' ^ .a.n .::si*;ht Into decision making through the use of Waterman's 
r:- :c:' . rulc S v^- consider the foUviwing example from ais paper. Let th^- 
- "C^ r (3 no tf!" f blowing: 

P (A, B, C) - (a, b, c). 

. ir • .:/:iir:.c /ariables with the current values of a, c, rtsru'Cti .fi. 

•■r' ; i: r t'^:** I' lll 'vin^ simple heuristics: 

I* A is an Al, then add X to the value of B. 

1 1 A iS an A2 and C is a CI, then subtract Y from the value of C, 
3^ fr H is 1 Bl, then add Y to the value of C. 

A in A I when A > 25. 
V A :s an A2 when A < 25. 

■ i s .\ I when B > 1 , 
?. ^Ws a IV:! when B > 4, 
IS a CI when C = 5. 
^' I nc revises is D increases. 
•Terr'as»'S \s I decrease*}. 



V c>>r rcsp.>nvi ini; r-r.^duction rules are 



M A, A ^ 25, 

a: a, a < 25, 

^> . 1 ri , B > I , 

>. X K I I), 

K2 ' (K3 * E), 



act ion rulo , 
act iv'^n ru le . 
act i on rul . 
backward torm rule, 
backward torm ruU- * 
backward form rule, 
backward form rule, 
backward form rule* 
forward ft>rm rule . 
forward form rule. 
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i.J iTv ihc following rules, one for each element of the subvector: 

A^ a, a f ^set of allowable values' for A^j backward form rule, 
'i^h, h % £set of allowable values f.^r bJ, backward form rule, 
c f [set of allowable values fr>r C^^ backward form ruU:, 
kk^cping variables, and X and Y are function variables* A star (*) 
['■': an action rule indicates that the variable in question is 

■ • wirh r(>;ard to that particular situation description. A star (*) in 
^ h.ind ^i.io of an action rule indicates that the value of the variable in 
r'^.a.r's -ar.cfiani^ed . Thus, the action rule 

(Al, X 4- h, means, if variable A has the 

;en increment value of B by X. 

the program is done in two steps: 
1- R ich eNmt nt of the current program subvector is matched against all 
ri.'.h^ ^:d' q af the backward form (bf) rules. W}w.^n a match occurs (the 
[■:r''^\ ic\^ ( \<; sitisfied), the corresponding sid^- of that bf rule- is 

-narc-v fi a^Minst all right sides of bf rules, etc. ^ until no more matcht'S 
car ^"»und. The resulting set ;^f s^/mbols d^fin^-s a s'/mbolic subvector. 



2. The symbolic aubvector derived in (1) is matched against all left sides 
of the action rules, going top to bottom, and when the first match 
is found the values of the program subvector are modified as described 
by the right side of the matched rule. 

To illustrate decision making consider the following example: 

Let the subvector have the values a»4, b^SjC^S, the constants 
have values Kl - 1, K2 - 20, K3 » 3, and let the bookkeeping variables 
have the values D « 7 and E ■ 8. Then p» (4,5,6). 

Step I is started by comparing a - 4 with each bf rule predicate, the 
predicate being satisfied only if it contains the symbol a and is true 
when a is set equal to 4. Thus a = 4 is found to match rule 11 and no 
others. Next A « 4 is compared to the right hand side of all bf rule 
predicates and is found to match only rule 5. Finally A2 - 4 is 
compared with all bf predicates and since it matches none of them the 
search terminates leaving A2 as the final symbolic value. Elements b 
and c are processed in the same manner and the symbolic subvector that 
results is ( (A2), (Bl, B2), (C) ). This subvector is a description of 
all situations in which variable A has the symbolic value A2, the 
variable B has either the symbolic value Bl or B2, and the variable 
C has the symbolic value C. 

Step 2 now consists In comparing the symbolic subvector ( (A2), (Bl, B2), 
(C) ) with the left side of each action rule until a match It found. 
In this case a match occurs at rule 3. The program subvector Is then 
set to the values specified in the right side of rule 3. Hence the new 
B equals (4,5, (20 - (3 * 8)) + 6) or (4,5,2). The program makes one 
external decision for each search cycle. Thus in a game-playing task 
the program would execute one search cycle each time it made a "move". 



The subveccor Waterman used for the game of draw poker is composed of the 
dynamic variables of the state vector and has the form: 

B - (VDHAND, POT, USTBET, BLUFFO. POTBET, ORP, OSTYLEX 
where VDHAND la the value of the program's hand, POT is the amount of money In 
the pot, LASTBET is the amount of money last bet, BLUFFO is a measure of the 
probability that the opponent can be bluffed, POTBET is the ratio of the money 
in the pot to the amount last bet, ORP is the number of cards replaced by the 
opponent, and OSTYLE is a measure of conservative style by the opponent. 

4.2.2 Program Creation and Modif:ication of Heuristics 

To create heuristics (either by modifying existing ones or hypothesizing 
new ones). Waterman uses three pieces of information: 

1. a good decision for the situation, 

2. the subvector variables relevant to making this decision, and 

3. the reason the decision is being made. 

This data la called the training Information . Item (1) above is called the 
acceptability information , item (2) is called the relevancy information , and 
item (3) is called the justification information . The training Information 
is either supplied by a trainer or obtained by the program during execution. 
The training information provides data for the construction of a new action 
rule. The acceptability information supplies the right part of the action 
rule, and the relevancy and Justification information supplies the left part. 

When the existing action rules lead to a poor decision, they are corrected 
by incorporating the training infonaatlon Into the production rules. Either an 
existing action rule (target rule) Is modified to catch the symbolic subvector, or 
if a rule appropriate for modification does not exist, the training rule Is 
inserted in the action rule list Immediately above the error causing rule. 
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An acclon rule is appropriate for modiflcaclon If it has the same form as the 

training rule. Two action rules have the same form only if (1) their right 

pares are identical, (2) their left parts have corresponding and (3) 

their left parts have symbolic values which correspond to the degree that they 

are both defined by the same logical operator. 

An example (taken from Waterman illustrates how an action rule can be 

modified to catch the symbolic subvector. The training information is : (1) a 

good decision to add 2 to the value of B, (2) the varicbles relevant to this 

decision are A and C, (3) the decision is being made because the current value 

of A is small and the current value of C is large. The program subvector is 

(5, 3, 13). The training rule (action rule) created from the training 

Lntormation, and the associated backward form rule are: 

( Al, *, CI )-^ ( *, b + 2, * ), 
Al A, A<6 
CI C, C >12 

The existing production rules are: 

1. (Al, C2)— *(*. b +2, *). 

2. (Al, Bl, *) (*, a + 5). 

3. (A2, *, C3)-* (*, b + 2, *). 

4. (Al, *, *) (*, *, a + 5X 



5. Al A, A<6. 

6. A2 A, A<8. 

7. Bl B, B>8. 

8. CI C, C >12. 

9. C2 -••C, C<5, 
10. C3 C, C >13. 



Rule 3 Is the only action rule which has the same form as the training rule 
(Al, *, Cl)-*(*, b ■♦• 2, *). The symbolic subvector obtained through parsing is 



Fage 16 



( (Al, A2) , (B), (CI) ), which catches on rule 4. Rule 4 leads to an 
unacceptable decision (the only acceptable decisions are those supplied by the 
training information ). Rule 3 leads to an acceptable decision and has the 
same form as the training rule; thus, it is used as the target rule. The left 
hand side of the training rule, (Al, *, CI), matches the left side of rule 3 
except for 03. If C3 in rule 3 is replaced by a symbolic value representing 
a set large enough to include the current value of state vector variable C, 
the symbolic subvector obtained through parsing will catch on rule 3. 
Therefore C3 is replaced by CI, changing rule 3 to(A2, *, CI)— ^ (*, b + 2, *). 
Waterman gives the following training procedure outline: 

1. Parse the program subvector to obtain the symbolic subvector. Then 
drop this symbolic subvector through the action rules to obtain a 
decision. If the trainer Indicates that the decision is acceptable, 
then stop; otherwise, go to step 2. 

2. Obtain the training Information from the trainer and use it to construct 
the training rule. If this information changes the symbolic subvector, 
then go to step 3; othervlie, go to step A. 

3. Drop the new symbolic subvector through the action rules to obtain a 
decision. If the decision is the one sought by the acceptability 
information, then stop; otherwise, go to step 4. 

4. Locate the error*causing rule, the action rule responsible for the 
unacceptable decision made in step 1 or step 3. 

5- Search the action rules above the error-causing rule for a target rule, 
a rule which has the same form as the training rule and is suitable for 
modification to catch the symbolic subvector^ If such a rule is found, 
modify it to catch the symbolic subvector and go to step 3; otherwise 
go to step 6. 



17 



6. Search the action rules below the error-causing rule for a target 
rule. If (I) such a rule Is found, (2) the error -causing rule la 
suitable for modification to pass the symbolic subvector, and (3) 
the rules between the error -causing rule and the target rule either 
pass the symbolic subvector or are suitable for modification to pass 
it, then modify the target rule to catch the subvector, the error- 
causing rule to pass the subvector, and the rules between these two to 
pass Che subvector, and go to step 3; otherwise go to step 7. 

7. Place the training rule immediately above the error-causing rule in the 
list of action rules and stop. 

The learning in Waternan's program is in two forms: (1) learning with 
explicit training, or (2) learning without explicit training (implicit training). 
When the program learns without explicit training, the program itself must 
develop the training information during the course of game play. The accept- 
abilitv infr^rmation f.^r Implicit training can be obtained through logical 
deduction. This process uses: 

the rules of the game, 

2. statements (or axioms) about the game, and 

3. general statements about techniques used in game playing. 

The result Is n set of logical statements from which new statements can be deduced 
using deductive inference rules. The reader is referred to Waterman's paper for 
an example of an actual deduction. 

The justification Information for implicit training can be obtained from a 
decision matrix that is game -dependent and is given to the program before learning 
starts. Each row of the matrix stands for a game decision, and each column stands 
for a subvector variable. Each entry F- • In the matrix is an expression whose 
value Is an attrlbu ;i of the subvector variable j. 

The relevancy information for implicit training is obtained through the 
generation and testing of hypothesis concerning the relevance of subvector 
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variables* Reasonable hypotheses are solved for in the following way: 

1. Let the initial hypothesis for each rule be that all subvector 
variables are relevant. 

2. Hvpothesis testing then consists In noting whether or not a 
particular training rule, placed in the set of action rules by 
step 7 of the training procedure, catches the symbolic subvector 
when the action advocated by the rule Is determined to be the 
correct decision. 

3. [f the rule does not catch the subvector, the relevancy hypothesis 
Lv^r that rule is changed. As many variables in the left part of the 
rule are made irrelevant as is necessary to make the rule general 
enough to catch the subvector. 

• - - 3 ?^^">gram Kvaluation of Heuristics 

Program manipulation of heuristics requires facing two major problems: 

1. evaluation of existing heuristics in terms of their usefulness to the 
program, and 

2, creation of new heuristics, by both modifying old ones and hypothesizing 

new ones, 

make a decision via production rules for a problem (1), a symbolic 
subvector representing the game situation is compared to all left parts of the 
list of action rules, going top to bottom until a match is found. The action 
rule which defines the decision, that is, one whose left part matches the 
symbolic subvector, is easily located. After the decision is evaluated, the 
credit or blame can be assigned to the action rule, and to those above It, 
which defined the decision. Here blame is assigned to action rules leading to 
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poor decisions^ while action rules leading to good or acceptable decisions 
are ignored. Assigning blame to an action rule consists in modifying the 
rule enough to avoid a repetition of the mistake or poor decision just made. 

4. 3 C laybrook' s First -Order Predicate-Calculus Representation 

Complete details of this representation can be found in Claybrook 
[33. This representation is implemented in a learning program that performs 
the non-trivial task of determining the symbolic factorization of multivariate 
polynomials with integral coefficients and an arbitrary number of variables and 
terms. The author agrees with Waterman that the representation of heuristics 
determines directly or indirectly how well a program can learn. The representation 
In the learning program, POLYFACT, was chosen because of the expressive power of 
tht' predicate cilculus. We were primarily concerned with using learning to 
improve the efficiency of operation of POLYFACT. 

4.3.1 Sepresentdtlon of Heuristics 

The notation is identical to that of first-order predicate calculus except 
tcr n minor difference involving domain specification for the assignment of 
VI lues. Tn the implementation of the predicate calculus notation, a heuristic 
can have one of two general forms: 

ft) N^AME (DOMAIN^) (DOMAIN2) * * • (DOMAIN|^) ( (ANTECEDENT^ C CONSEQUENTx) 
(ANTECEDENTf^ C CONSEQUEOTq) ) $, or 

(2) NAME ((ANTECEDENT| C CONSEQUENTi) 0 (ANTECEDENTj C CONSEQUENT2) 
(A^JTHCEDENTr^ C CONSEQUENTS^) )$ 

In either of the above forms, the same antecedent or consequent can occur 
several times; but the same antecedent -consequent pair should occur but once. 



The rirsc f^^rin has a non-null domain; whereas, the second has a null domain. 
One or the tunctiv>n3 ot the non*null domain is to specify an ordered set from 
which the values tor the variable (indicated in the domain field) are taken, 
5 v^me >t the variables in tnc antecedent -consequent pairs can be free, i^e, 
their values .ire specified elsewhere. Each bound variable must appear as an 
argument in at least jne antecedent or consequent, i.e. each variable specified 
in a domain must appear as an argument in at least one of a predicate, a function, 
jr a consequent. The domain as defined in this paper corresponds to the 
quantifiers in predicate calculus notation; however, in predicate calculus 
notation the domain is not included as a part of the quant if ier* The order of 
domain precedence is identical to that of the quantifiers. 

Kach antecedent is a single predicate or a logical combination of predicates 
connected by conjunction ('A' = AND) and/or disjunction (•0* ^ GR) operators, 
hiach nredicate is a logical function and can be referenced with arguments that 
if. c nstanis, variables, or functions. is the conditional operator, and 

the consequent is always the name of a routine (or procedure) that is executed 
when the corresponding antecedent is satisfied. 

To illustrate the representation of heuristics in the predicate calculus 
notation, use \n example taken from the term selection heuristics in POLYFACT: 

MM T TN' I?TRSO) ((N H1(G11(T), MINDEG) C FIX123)) $^ 

This heuristic consists of the components: 

'1.1 is the NAME of the heuristic, 

ir: : J^: IPTR'^^O) is the domain of the heuristic, 

is the negation operator ^ 

HI is a predicate that is 'TRUE' if Gll(T) 

equals MINDEG, 

f'H is a funct ion whose value is the degree 

of term T, 

r is a br^und variable* 
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is a constant t unction , i.e. a function 
whose value i'i constant during the exe- 
cution of the heuristic, 

^ is Che conditional operator , and 

is ) CONSEQUI-NT. 

Interna I Iv, the prrdicate calculus heuristics are represented as linked 
list!* with f ich individual atom stored in a separate cell in the list. 

The hfuristics are executed by an interpreter. During execution, the 
predicat;' calculus t.^vr?. is translated into reverse Polish notation. Then the 
rev.TSe ;\«lish string is executed with references to predicates and consequences 
causLrm t-it fxecuci-in the corresponding procedures. 

Iho n.aristics that contain non-null domains select elements from the sets 
mv-jn in domains. In the selection of elements from a set, a heuristic can 
c^nsidt-r ill elements in the set. In this case, the domains have the form: 

( E T IN IPTRSO ), 
A-^.t-rf LniiciLts that .2 II elements (7) in the set [PTKSO ,-ire selected 

during the execution of the particular heuristic. A heuristic with a non-null 
domain can also consider elements from a set until an antecedent is satisfied. 
The corresponding consequent is then executed and activation of this heuristic 
is terminated. This type of domain is represented as: 

( EA T IN' TPTRf^n ), 

whore -A indicates that some (possibly all) elements in the set IFTRSO are 
selected. '« 

A heuristic with multiple domains is executed by selecting elements from 
the innermost domains first. This execution has the same effect as nested loops 

in pr-n/rimm in^ languages. 
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h:i)igrain Mv'ditlcatton and Creation 

First we explain how thd creation and modification process works, and then 
we describe the training procedure for POLYFACT. Ttje reader saw in Section 4.2.2 
that during perivids of implicit training in Waterman's program, his program 
t'nplovs 1 decision matrix (created by a human prior to implicit learning), 
the Itarning schemt in POLYFACT uses a set of tables to specify relationships 
between predicates, consequents, and domains. The tablf-i are pre-complled by 
hand and re id from input cards and stored in the tables. 

The consequent - predicate table gives the correspondence betwv'cn each consequent 
and Che prvdicates that can be used to form an antecedent -consequent pair. 

<^ ^"sequent - d.)ma in type table specifies the correspondence between each 
c-nsoquent and the sets from which values for a variable are selected. Each 
h.iund varLible must be an argument in a predicate (within an antecedent) or 
c.Tnsequt^nr . Ihe domain type - variable -set table defines the variable-set pair 
ass-ci.ited with a domain type. The domain is determined by the variable and the 
sot fr >m which the values of the variable are taken. The purpose of this table 
is t ' prevent heuristics with a given type of domain from using predicates and 
consequents associated with another type of domain. In addition this table 
could pre-A'": the creation of heuristics which have a certain mixture of domains. 

The reader should note that a domain is a set of values (represented In 
POLYFACT as a linked list of values), a predicate Is a logical function 
/'reprosentfd in ^OLYP'ACT as a logical procedure), and a consequent Is an action 
to be taken (represented in POLYFACT as a procedure). 

The heuristics in POLYFACT can be maintained either In first- ^^rder predicate 
calculMS notation or in a combination of first-order predicate calculus notation 
and an t-ncodini? of the predicate calculus notation. As we describe the learning 
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associated with term and possibility selection we will describe how the 
predicate calculus is encoded* 

The learning (through the modification of heuristics) associated with 
tertn selection is as follows. After a successful factorization attempt is 
completed, the number of possibilities (factors of a term) in each term of the 
polynomial is determined. The features of the term(s) with minimum number of 
possibilities have their frequency count (s) increased. Each feature has a 
predicate associated with it. The predicate is true if the term has the particular 
feature and false if it does not have the feature (features used are degree of 
term, number ot variables in term, etc,)* The frequency counts associated with 
each texture are examined to determine whether or not the set of heuristics for 
term selection need to be modified. The heuristics are ordered to impart the 
importance ot features for "good*' term selection. If two or more features 
have identical frequency counts, then they are of equal importance in selecting 
1 ttrm. Thus* in predicate calculus notation this would result in an antecedent 
bavin^^ tw> predicates (corresponding to the two features) connected b •OR*. 

Since F'OLYFACT uses a classification technique to implement localized 
leirning for term selection, the term selection heuristics are not maintained in 
predicate c.ilculus notation (because of the amount of storage space required) « 
Instead, the term selection heuristic are encoded into a small ordered list of 
words. Each feature is represented by a particular bit in the word. The presence 
of a 'r in that bit indicates the presence of the corresponding predicate in 
the heuristic. In this way a single word describes the entire heuristic. 
Not only does this encoding save considerable storage space, but it Is much 
easier to modify heuristics using a numeric representation than the symbolic 
representation of predicate calculus* Prior to execution the selected heuristics 
are expanded into predicate calculus notation for interpretation. 
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The poPsibiUties (terms) that can be selected as terms in a factor of a 
polvnomial are ranked according lo their probable merit. During a factorization 
attempt, the higliest ranked possibiKties are selected. After a polynomial has 
been factored^ each term in the factors is examined to determine its set of 
characteristic features. A binary vector is created with nonzero entries 
i.adicatLa>; the fcMtur^-s present • Then a heuristic is created (unless one already 
exists > usine. is predicates those that correspond to the features present. 

T) facilitate the construction of this set of heuristics, a matrix is 
maintained prv/iding a history of the features of terms that have appeared 
In factors u prevl.^us p.^ Ivnomia Is , After the vector of features has been 
cr-zated for a term, it is compared with each row in the matrix to determine if 
the vector is already present. If so, the frequency count for the matching row 
is increTnenrrd (the frequency count is kept as an augmented column in the matrix). 
If rhe vector is not in the matrix, it is added and the corresponding heuristic 
is cr<Mtf'vi . 

Vhrn tht^st' ritniriscic^ <ire used to rark terms, the satisfied antecedent's 
(ir tr:erc is i satisfied antecedent) frequency count becomes the rank of the 
rvrm. Tnese hc-uristics are maintained in predicate calculus notation and also In 
thf enc. ded mjrrix form* The matrix form is convenient for determining the 
need ^" ^r n.K] ; I icat i >ns. All classes of polynomials use the same possibility 
select i)n heuristics. Modifications range from adding a predicate to an 
exist mil? antecedent to adding an entire antecedent-consequent pair, 

Icuristic^ cm be created and modifLed during the training period or later 
when no explicit information is given POLYFACT. During the training period, 
pol /n.'mi^ls irt- input r ^ POLYFACT along with information giving the number of 
tt-rms i:i *-\\ch >: fhc two f;ictors. In this training period the polynomials are 
cIissirifKl md h*Mirisrics ire created for term selection and possibility selection. 
Polvnomials in th»^ rr, lining sequence d.'> influence the fac tor I ^.at inn times of 

^ ft 



Page 25 



subsequent Iv tactored po lyn.)mials , but similiar looking po lynv)mia Is have so 
varied characteristics that it is difficult to select a good training sequence. 
However, after the training period is over, if the polynomials have many 
characteristics in cotmnom then the program adjusts the heuristics to reflect 
this. During the non-training period no helpful information is given to 
POLYFACT. Also learning can be turned off completely at any time. 

4.3.3 ^^^>grani Kvaluation of Heuristics 

Assigning credit or blame to a heuristic in POLYFACT is a much simpler 
task than in Waterman's program, and the capability to reference heuristics 
individually by name provides the ability to do this. 

Credit is ^iven a heuristic by increasing the frequency count associated 
with the heuristic. Blame is not so easy to interpret. Blame can be interpreter* 
when the tv/rm selection heuristics do not select the *'best*' term to initiate 
tht; rictorizaLim process, and when the possibility selection heuristics do not 
rank the poss ih i I it ics so that only the highest ranked ones appear in the factors 
ot a successfully factored polynomial. 

Pi/aluation of heuristics can result in re-ordering heuristics, adding 
predicates to antecedents, etc. Term selection heuristics can be modified on 
either success Lul or unsuccessful factorization attempts, but possibility 
selection heuristics can only be altered after successful attempts* 

^ '^i-Hflton's Representation 

Befnre wp discuss Winston's learning system (or more properly his language 
md nnrTM-)n for drscrtbin^' scenes), we describe, in general terms, whnt his 
system does. Winston's program analyzes scenes consisting of the simple objects 



pu:vi in a child's t.)y box. The description of a scene is in tt?rms the 
b]vcis chat make up the scene, 

;:eneration ot a scene description begins with a drawing of the three* 
in^.ensi.in.U scene. The drawing is conmunicated to the machine using a 
r Kiam together with a special pen whose position on a tablet can be read bv 
7\:\q\.i:\v Jir^-ctlv. Then a program classifies and labels th<' ^/ertexes, 
>^raT, r tien creates names for all of the regions In the scene. 
V scriptDiis )t scenes are stored so that they can be easily retrieved, 
ich 'h;r'ec iw 1 scent.^ is naturally thought of in terms of relationships to other 
''ve:.s vAd to descriptive concepts like small, square, etc. Thus, Winston uses 
tKs t ) st^rrj the scene description. The network resides in the data base 
rhr r-rn -r li55t structures. For example. In Fig. 2 the nodes represent 
'C'rs Kid thf p >iaters represent relations between objects. 
The descriptions permit one to compare and contrast scenes through programs 
. -np ir- in l c ntrast descriptions by retrieving the descriptions from the 
rw r<, Thf descriptions should be similar or dissimilar tn the same degree 

■ r ^h' ^c^'i^'S thf/v represent seem similar to dissimilar to human intuition. 

r tw> scenes are described and corresponding parts related by a matching 
' ..r iTT!, Jirferences in the descriptions must be found, categorized, and them- 
1 > tescrined. Later sections describe how the matching of scenes Is used to 
Mie models of scenes. 
Ident i r ic<it ion of scenes Iscarrledout as follows: compare the description 
^ ,rTV' sc^H" t > be identified with a repertoire of mode Is or stored concepts . 
wrv IS .1 method of evaluating the comparlsona between the unknown and the models 
truir some rr^.irch c:in he defined. The Identification process in Winston Vs 

■ jr K'! ]:^. \ -r^i ^r probN^m area. It is comparable to determinini; whether )r not 

.*,r iphs ?rv i3:»morphic. 
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The next two sections provide more details tor the construction of 
Tu U^'U ( >r concepts) and the language used to dt^scrlbe the models. 

* ^ • ' '^ opresencac log Models 

It is .iitiicult CO talk about Winston^s learning system with respect to 
his r. prest^ntat Lon ot heuristics. He stores learned Information in a network 
tnuMie 1) desi:rihttd by a language expressing relations between objects in a scene, 
•ir r^tjJel reprtsonts (or is) the concept. During the building of a model of a 
scM such as that in Fig. 3, Winston's program is creating a concept to 

Vhrtv is A slight difference between a description of a particular scene 
i;ui 1 noJ^ ! o: 1 concept. A model is like an ordinary description in that it 
c irru vs inr urination about various parts of a configuration. But a model also 
r:<hihits inJ indicates those relations and properties that must and must not be 
1:: : . . it nci in inv cxanpU^ of the concept involved. 

In «rdtr t ^ develop the representation of models, we use the pedestal 
tr iuunt; sequence in Fig. 1. Then we describe a pedestal description and a 
mo<\rl in riL-. : ind Fig. 3, respectively (these examples are taken from 
Wmst infjT) ) . Thi' first step Is to show the machine a sample of the concept to 

; irrv The rost ai the samples are near misses (a near miss is a sample in 
1 tr lining sequence 1 ike the concept to be learned but differs from that concept 
in nlv ^ few significant points). The near misses simply refine the description 
'.>i *:\)^* ptdrst/il to th(^ point where It is a model of the pedestal. 

The scc^^nd snmple in Fig. I Is a near miss due to the absence of the 
.rt^^^ -hv relation (a description of how the relations In a scene are 
.1' roraiinrd is -^A vv.n in the next section). The other samples strength the other 
r lofivns in rhr description and finally turn the pedestal description In Fig. 2 
int, ' rhe modf i in Fi><. 3. The training sequence In Fig. 1 is revisited in the 
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s-cti •-. crtMtio!-' and mod i f icat i>->n ot" modfls. 

.'his -^.^.-r, i-A w.is intt^nded to give a brief view of the represtnitat ion of a 
A. 'ntd earlier that the model is rt^presented internally as a 



.V^'^iiticdtion ot Models 



■■■■■<.■ .'i.lv'-' 1 



It - S 



Ire adv discussed the difference between a scene description and 
•• s stcti.<n we describe in detail how a model is created and 
ir i training sequence. We use the pedestal training sequence in 
: r boc.w.e specific on the development of the model in Fig. 3, 

• ul.i c.tnsider a more general description of model development. 

■ 'uLldinx program starts with a description of some example of the 
1 irn.'d. This description is the first model of the concept. 

thf development of a model sequence where there is only one 
n 'Au' current model and the description of a new sample. 

iHs to a new model, Winston's program compares the description 
cro current model to determine any dlf ference(s) . We did 

First Model 
Second Model 
Third Model 
Fourth Model 

■ - Vodet Development with Only One Difference, 

' '■■ Section 4.4, but during the development of the description 

1 scnnt . .Ach seem, is analyzed to determine the relations that objects in the 

>-^'.r)t.' program exists for detecting each relation, i.e. a heuristic 

•^r r;fici]ly for detecting the existence of the SUPPORTED-BY 
:•; ^-RorJT-OF relation, etc. 



-mrlo 1 
■^ i":plo 2 



erJc 
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Sr'veral ditferences may occur between the current model and a new sample. 
Then several branches may occur and we have a tree of nodes as given in Fig. 5, 
The alternative branches come about by the program selecting one branch at each 

First Model 

Second iModels 
Third Models 

Fourth Models 
S M;>del Development with Several Differences ^ 
noinr ror further do vt' lopment . The path leading from the top of the tree down to 
trhe current inociel is called the main line . The main line changes course when a 
pirticiilir sequence of branch selections leads to untenable situations. 

^he nr^^ram has to deal with alternatives to the main line of model 
Je. ! prr^ent. Main line assumptions may lead to contradictions which in turn cause 
the model building program to retreat up the tree and attempt modeil development 

it Che differences have multiple interpretations or more than two differences 
>ccur, the number of possibilities can explode. The machine must decide which 
mt^rpr'^^t u i n )f which differences are most likely to cause the near miss. 
The machine first forms two lists: a primary list and a secondary list. Each 
interpret It i )n eventually ends up in one list or the other. Some interpretations 
can never make the primary list because they are unable to explain why a given 
Simple i3 a near miss. All of these interpretations go immediately to the 
secondary I i st . 

The next way to sort differences is by level. This assumes only that the 

differences nearer the origin of the comparision description are the more 
Important. The pr )gram determines the depth of the remaining nodes which are 
neare?*r the ^ri^in of the comparison description* All those candidates found at 
i^reater depth are placed on the secondary list. 
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Tho primary ditforence list allows the program to torm a theory of why 
Che near miss misses and what to do* This theory (or hypothesis) specif it* 
one difference as the single cause of the miss and specifies which interpretation 
of that difference is issumed. The differences at the same level are ranked 
according to type. Then the one with the highest rank is chosen as the cause 
of the near miss. Winston provides a table spec i fv ing ,a priori, differences 
and their possible interpretations, 

Now we return to our discussion of the pedestal training sequence and model 
in Fig, 1 and Fig. 3. Reviewing very briefly, the model building program begins 
with a description of the concept to be learned (in Fig. I), The second sample 
in Pi^, I is a near miss because the supported-by relation is missing. Thus, the 
machine can only conclude that the supported-by relation is necessary and a new 
model is developed with a must*be-suppor ted* by relation. We see there is only a 
single difference between the second sample and the current model. Samples three 
rhr^M>:h five strengthen the fact that the support is standing and the supported 
object is J lying board. In particular, sample four strengths the relation that 
the supported object is a board and sample fiv. strengthens the fact that the 
board must be lying. In samples four and five there is only a single difference/ 

The strengthening of the relations in Fi^.^ 2 by the training sequence in 
Fig. 1 results in the pedestal model in Fig. 3. 

The reader should be able to detect a note of importance to the development 
of a training sequence for model building. Winston stresses the Importance of 
a good teacher both in human learning and machine learning. He says that in the 
past history of machine learning the use of a teacher was considered cheating, 
and machines were expected to self organize themselves, Winston's training 
sequence sample selection is probably more critical than the tr<iinlng sequence 
selection in Waterman's program, and it certainly is more critical than in 
Claybrook's POLYFACT. 
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The subsection on program evaluation of houristics tor Winstv^ii's svst»'m 
is emitted since this notorial is dusctisst»d above in the ^trve li^pmcnit oi rm^dols. 
This omission brinies up intt'rfc»st ing pv>int - somo ItMrnin^* sviilx.u\s con hv 
subJividt^d into verv clear cut subdivisions, while others cannot^ 

'* • ^.onrnon . harnc t:<. r is t ic s of the Fhreo Techniques fot 
K epro s^-. u Ln £ Hour ist Ics 

vVte characteristic cotnmon to all three techniques is that a powerful 
lan^-uue is used for each representation. This is an especially important point 
• 'ecause the e.irlv learninjj; programs lacked powerful languages for heuristic 
representation, and it is the author's contention that this is the reason for 
int [ ick si^niticant advances in learning during the I9b0's. Another point, 
' isL^ric.iI in nature, is that all three techniques were developed in the early 
I97nvs fi.e, all three dissertations were completed during this period). 

Another common characteristic is that all three techniques were employed 
c omplex pr>b!eTn spaces vr-rsus simple game problem spaces. 

Fach representation language is separated from the program code, i.e, the 
heuristics in each program are separated from the program code, thus allowinp 
t'ltm t) be manipulated dynamically. Also the representations are modular in 
nature allowln^^^ the heuristics to be easily created and manipulated. Although 
rhv ipproach in each case differs, each technique uses a form of generalized 
leirning. Credit assignment occurs in each technique and is discussed in the 
next Sec t ion , 

All three techniques for representing heuristics have most or all of the 
requirements ^f heuristics listed in Section 4.1. The readier may want to scan 
tfiis li^r again and consider each representation as he dc^*'S so. 

Another thing common to all three techniques (and also common to all other 
pr^ vi Mjsly implemented learning systems) is that a change In problem environment 
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rc.}uiri-3 sotnt- etl-irt -i-i thf nart ot the user t.i d»'tv'rnuti,' (v-SfiiMi tuiiristics 
:.>r solving che now problems (or at lease dect-rmino U-.. lures tor koyini; on 
during learning^ and in some cases providing the learning svHtern, a priori, 
with rules or other intormation about the problem area. In Waterman's y.^kvi 
orogram. rules of the game are supplied and also a precompiled decision matrix is 
supplied, Cl.ivhrook supplies tables tor controlling the construct i. n oi 
heuristics, and Winston provides tables listing one or more interpretations 
r T e jch di::'rrence in scene descriptions. At the present time, it is nearlv, 
i: n,!L completelv, impossible to develop a learning system that can .operate on 
ar I i-as pr.ihlen areas without some inrormation being supplied to the svstem hv 
user. in Section 5.0 we discuss how to organize a learning system so as to 
redact- the etf ^rt in moving from uue problem domain to another. 

riie last c miparison of the three representation techniques is vitii respect 
their p-ver >f representation. The predicate calculus and production rule 
r- .^r-s.' >L t: 1 languages appear t i be the more powerful ianguage.s i-^r 
r.oresenting complex actions. The network model approach is a natural approach 
r-:'rr sentat ian but It requires efficient retrieval and matching procedures to 
nractica!. Winst >n acknowledges these two problems in his dissertation and has 
"s')d" s.lution to them. Barrow [1^ has made progress in structure matcliing. 
":h" production rule system of Waterman appears to be the most complete 
sv-st.-r; for machine learning of heuristics. Claybrook's s^'steni needs more 
development in the area nf autt^matic generation nf heuristics, Witernan's 
system also has the advantage that it is probably better documented in the 
literature than the other two systems. 

^ • • r e d i t A s s i ^nme n t 

The author reels that credit assinnmcnt in Waterman's repres< nt if i .n is the 
most sophiscated and advanced; however, the reader should remember that this was 
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one ot the main thrusts of his research, while Claybrook and Winston 
were more interested in studying a particular problem area. 

It is not easy to discuss credit assignment with respect to Winston's 
language ao we will begin by discussing credit assignment as It relates to 
learning under Claybrook's representation, followed by Waterman's representation, 
and finally Winston's representation. 

The program POtYFACT has an analysis procedure that analyzes each 
factorization attempt and collects information on which features of polynomials 
appear co be the most important with respect to the chosen heuristic that directs 
the learning. One thing that every learning program must have to handle the 
credit assignment problem Is some sense of direction for directing learning. 
Either the user provides this sense of direction, as In POLYFACT, or the program 
can possibly learn it. In POLYFACT assigning credit to a feature that minimizes 
the search apace for a factor of the polynomial actually guides and produces the 
learning. Credit Is also given to "good" heuristics by ordering them according 
to their importance (Importance with respect to the sense of direction ). 

Waterman's technique for generating heuristics places more emphasis on 
credit assignment than the other representation techniques. His work deals more 
directly with machine learning of heuristics and determining which heuristic is 
responsible for a "bad" play in poker. We hasten to point out here another 
common characteristic not discussed in the previous section -- the heuristics 
in each of these three learning systems are placed into some order by the 
learning mechanism; thus, providing another reason for having heuristics that 
can be referenced individually. Credit assignment in Waterman's program can 
cause a production rule to be modified or cause inclusion of a new rule into the 
set of ordered heuristics. Section 4.2.3 provides a discussion of how his program 
assigns credit to heuristics. 
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Credit assignment In Winston's learning system occurs during model 
building* Relations can be reinforced by attaching must to a relation. 
The near misses in the training sequence are used to indicate those relations 
and properties that must and must not be In any example of the concept. Thus 
credit is assigned by reinforcing those relations that classify the examples 
either as positive or negative instances of the concept. 

5.0 APPROACHING THE PROBLEMS IN LEARNING SYSTEMS 

Since the major thrust of this paper Is to encourage learning as a problem 
solving tool, we need to describe how some of the problems mentioned above can 
be approached. What the author proposes is the development of a learning system 
composed of components, in program form, that can be reused over and over in 
different problem spaces with little or no changes to the components. Some of 
the problems such as supplying particular information, in the form of rules, etc*, 
r.^r eich problem domain still remain since current technology has not been 
developed to the point that a program can extract this information without help. 
We are not going to repeat our discussion on the importance of heuristic 
representation because we feel this has been adequately covered in Section 4.0. 
That dlicu. ton should provide the reader with enough information to select a 
representation of heuristics. Section 3,0 introduces the reader to problecia 
in learning and how to handle some of them* 

Most of the components of the learning system outlined below are In all 
three learning programs discussed in this paper « We believe a reusable learning 
system should be composed of the following con4>onents or have the following 
characteristics: 

h A classification mechanism capable of classifying objects into 

classes so that heuristics appropriate to each class can be applied. 
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2. localized learning aiaociated „ith each class of objecta in the 
classification mechanism so that global learning is not used. 

3. A method for representing powerful heuristics that can be created, 
modified, and executed dynamically. 

4. A technique for encoding heuristics to conserve storage. 

5. A simple procedure for referencing an individual heuristic (or 
a set of heuristics) and executing it (or them). 

6. At least one type of learning mechanism, e.g. generalized learning 
and/or concept learning (preferably both types). 

7. A procedure for allowing the learning mechanl8m(s) to direct the 
creation and mddification of heuristics. 

8. A procedure(s) capable of analyzing the results of a problem solution 
attempt to determine If any heuristics should be modified. 

Of the components listed above probably only (8) would need to be modified 
from one application to the next. 

6.0 SU>flARY 

This paper outlines some of the proi 'ems associated with implementing 
learning programs. We have stressed the importance of the choice of representation 
of heuristics for this choice affects the learning capabilities of any learning 
system. Three representations for heuristics were discussed In detail. A brief 
comparison of these techniques show that they have most of the important 
requirements of heuristics in conaon. Of considerable interest Is the 
departure from the simple linear evaluation function approach in the early 1960*8 
to the more powerful languagaa approach In the early 1970' s. 

Section 5.0 provides an outline of a learning system of reusable components. 
Learning programs, in general, are large and time consuming to develop. Thus, 
a possible approach to using learning la to reuse cosiponents without modifying 
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them when moving to different application areas. Of courae some programming 
effort* dependent on the application, is still required* but the main 
components of the learning system remain unchanged • 
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